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1. INTRODUCTION 

Computer vision is a branch of artificial intelligence (AJ) that trains a computer to analyze data from 
images [1], [2]. It has been used in many domains such as medicine [3], [4], education [5], banking [6], [7] 
logistics and transportation [8], and agriculture [9]. Moreover, this technology is being increasingly applied 
in intelligence systems, internet of things (IoT), cloud computing, robotics [10]-[12], and IoT-based vehicle 
verification systems [13]. 

Computer vision is also used for document processing systems, for extracting data from handwritten 
or printed text. Optical character recognition (OCR) plays an important role in extracting characters from 
paper-based documents. Integrating OCR into a document processing application increases the quality of 
data, efficiency, and speed, and reduces human error in extracting data [14]. OCR can also be used by the 
government to prevent crime and enforce the law. For example, it can be used in a passport identification 
system for immigration admission or a vehicle license plate system for detecting and controlling traffic 
offences [10]. Nowadays, the development of OCR applications [15], [16] is much more convenient because 
multiple tools, both free of charge and commercial, are available for their development, and these tools 
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support various native languages. Furthermore, each tool in the market is competing to develop APIs and 
libraries that will further improve their computation quality, such as Google Cloud Vision, Tesseract, Huawei 
Cloud Vision, matix laboratory-OCR (MATLAB-OCR), and OpenCV’s EAST detector. 

This study aims to compare the performance of the Google Cloud Vision API [17] and Tesseract 
[18] to recognize document images of Thai vehicle registration certificates to describe car information. 
Currently, Thai vehicle registration certificates are still in a hard-copy format. Government officers, police, 
or people who want to buy and sell vehicles cannot be automatically connected to the database. The user 
must import data to the system with key-ins that may be error-prone. Moreover, this paper studies existing 
pre-processing techniques to increase the accuracy of the performance of Google Cloud Vision and 
Tesseract. 

The rest of the paper is organized as follows: section 2 presents a literature review of the relevant 
issues. Section 3 presents the research method, including the study of tools and techniques for enhancing the 
process quality and the design and development of the proposed system. Section 4 presents the results of the 
comparison and discussion. The last section presents conclusions and suggestions for future work. 


2. LITERATURE REVIEW 

Computer vision uses deep learning to create neural networks that guide image processing and 
analysis [19]. Computer vision models can recognize objects, detect, and recognize people, and even track 
movement when receiving full training. All available tools used in computer vision include mathematics, 
geometry, linear algebra, statistics and operational research, functional analysis, and intelligent models. They 
are used to create algorithms for the segmentation and grouping of images so that computers can understand 
the scenery of their features. OCR is a process for recognizing and translating text that appears in images, 
documents, symbols, or marked scenes, which the computer can access and process as text characters. OCR 
can convert images containing text that are handwritten or printed [14]. OCR technology increases the data 
entry speed and reduces human error. In addition, OCR can improve the recovery and file handling 
performance of a storage device. Printed text is typewritten text from any device, such as a computer or 
typewriter. The OCR subtype targets to proceed with one character at a time using pattern matching and 
feature analysis. In contrast, intracranial response (ICR) may use machine-learning techniques to proceed 
with the text. Handwritten text is a handwritten input from any source, such as paper documents or photos. 

The offline mode is the process of a static document, whereas the online version uses handwriting movement 

analysis. The significant phases of OCR consist of the following six phases [19]—[22]: 

— Optical scanning captures images from various sources, such as scanners or cameras. 

— Pre-processing improves image quality. Examples of pre-processing techniques include binarization, 
sharpening, and image adjustments. 

— Character segmentation passes the character to the recognition engine. The most straightforward 
techniques used for this are connected component analysis and projection profiles. 

— Feature extraction extracts and recognizes different features in the image. 

— Character classification maps features to different categories using classification techniques. 

— Post-processing: After classification, the results are not 100% correct, especially for complex languages. 
Post-processing improves the accuracy of OCR systems. These techniques utilise natural language 
processing and geometric and linguistic contexts to correct errors in OCR results. 

OCR has been used in many different domains. For example, Patel and Patel applied the Tesseract 

OCR and Transym OCR tools to consider vehicle number plates [23]. In another case related to an 

autonomous vehicle, Sugadev et al. [11] applied the Google Cloud Vision API to process images, improve 

the obstacle detection accuracy, and provide the capability to identify obstacles for rough terrain autonomous 
robots. Moreover, many domains use this technology to increase the data entry speed and reduce human error 
in extracting data from document processing systems [24], [25]. Moreover, such technology has been 
increasingly applied in intelligence systems, IoT, cloud computing, and robotics, such as signboard text 
translation [26] and IoT-based vehicle verification systems [27]. Since 1969, when the US Army introduced 
the first application to use OCR technology in IBM 360 computers, the development of OCR technology has 
become much more convenient than in the past because many development tools have been introduced to 
support the OCR developer. Today, tools support both offline and online use. Furthermore, each tool is 
competing to develop new APIs and libraries that will further improve computation quality. Such tools 
include Google Cloud Vision [17], Tesseract [18], MATLAB [28], and OpenCV [29]. Google Cloud Vision 

API is a Google Cloud platform service that leverages Google technology capabilities to facilitate image 

analysis. This service can help to understand image content by using the machine learning tools developed by 

Google. The necessary capabilities of the Google Cloud Vision API include the following [17]: 
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— Inspecting objects within the image (entity detection) 

— Reading text within images (OCR) 

— Checking for inappropriate images, such as porn, using the Google Safe Search 
— Finding faces within the photo 

— Checking the location within the image (landmark detection) and 

— Checking a logo inside the image (logo detection) 

The first step to detect text from an image using the Google Cloud Vision API is uploading the 
original image to cloud storage. A cloud function is triggered, which uses the vision API to extract the text 
and detect the source language. The text in each language can then be translated by publishing a message to 
pub/sub. A cloud function uses the translation API to translate the text in the translation queue and sends the 
results to the result queue. Then, another cloud function saves the translated text from the result queue to 
cloud storage. 

The Tesseract is an open-source OCR tool that began as a PhD research project in HP labs to 
develop HP flatbed scanners [18]. The Tesseract is available under the license of Apache 2.0. It can be used 
directly, or an API can be used to extract printed and handwritten text from images. Tesseract supports more 
than 100 languages and both printed and handwritten texts. The Tesseract has several available third parties, 
including more than 40 partners, such as TesseractStudio.Net, PDF OCR X, and TaxWorkFlow. In addition, 
Tesseract is compatible with many programming languages, such as C, C++, and Python. Currently, the latest 
stable version of the Tesseract is 4.0. When the user inputs the original image to the process, the binarization 
algorithm will produce it to the outline and gather it together, purely by nesting, into Blobs. Recognition then 
proceeds into a two-part process. The recognition model attempts to recognize each word, and then the 
adaptive classifier recognizes the text lower down the page more accurately. In the second part, Tesseract 4.0 
uses a convolutional neural network (CNN) to recognize an image containing a single character and provide 
recurrent neural networks (RNNs) and long short-term memory (LSTM) with a sequence of characters. 
A comparison of the basic features and architecture of the Tesseract and Google Cloud Vision API is shown 
in Table 1. 


Table 1. Comparison of the basic features between the Tesseract and the Google Cloud Vision API 


Feature Tesseract Google Cloud Vision API 
Latest Stable Vision 4 Unknown 
Technology CNN, LSTM, VGSL Unknown 
Cost Free Pay per use 
Software Type Opensource Cloud Service 
Architecture On Web/On Device On Web 
SDK Yes Yes 
License Apache Proprietary 
Supporting OS macOS, Window, Unix, Linux Browser 
Encoder Base 64 Base 64 
Programming Language C, C++, Python C#, Go, Java, Node.js, PHP, Ruby, Python 
Font Printed and Handwritten Script Printed and Handwritten Script 
Supported Language Printed=100+, Script=35+ Printed=200+, Script=200+ 
Thai Supporting Printed and Handwritten Script Printed and Handwritten Script 
Third-Party 40+ Unknown 
Supported Mobile Platform Android, iOS Android, iOS 


Both the Tesseract and Google Cloud Vision API can recognize printed and handwritten text and 
support many programming languages and other third parties. Base 64 is the standard format of a data format 
that the Tesseract and Google Cloud Vision API use to encode data from image to text. Tesseract is 
open-source software; therefore, it cost little compared with the Google Cloud Vision API, a cloud service 
where the cost depends on usage. However, Tesseract can be more challenging to set up and configure on the 
server than the Google Cloud Vision API, which comes with ready-to-use environments. Moreover, the 
Google Cloud Vision API provides the interoperability of cloud storage, cloud function, and Google 
translation through pub/sub. Simultaneously, Tesseract works together through each internal module, which 
may significantly affect the processing time. 

The Thai language elements will consist of 44 letters, 21 vowels and 32 sounds, and four tone 
marks. Moreover, Thai letters can be divided according to the writing style into five groups. The similarity of 
the letters makes it easy to recognize the Thai script shown in Table 2. In Thailand, computer vision has been 
used for more than ten years for scenarios such as citizen ID cards [30], translating Thai menus into Malay 
[31], and license plates [32]. Also, Thai character recognition is studied in various techniques, as shown in 
Table 3. 
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Table 2. The Thai letter, vowel, and tone mark 
Thai letter 


Head-in Type ia qyauann wdearaos 
Head-out type Aavagggnuudwianaa vuw 
Double head type vy 

Broken head uyan 

Headless type ns 


Thai vowel and sound 


Thai tone marks 


əz / ua / woz / an / az / 100 / & / a / d1 / & / 180 / 10 / 8 / o /'o / 8 / dz / 


101/Q/63/q/Q/ ior /q)/ wz/1o/q/ w0 / e/m / uo, 00 


LYS + 


Table 3. The articles that study techniques of Thai language OCR and results 


Article 


Method 


Result: Accuracy 


Kraisin and Kaothanthong [33] 


Kitvimonrat and 
Watcharabutsarakham [34] 


Kobchaisawat and 
Chalidabhongse [35] 
Sumetphong and Tangwongsan 


[36] 


Thammano and Duangphasuk 
[37] 
Somboonsak [38] 


Chomphuwiset [39] 


Pornpanomchai and Daveloh [40] 
Tangwannawit and Saetang [41] 


Jirattitichareon and 
Chalidabhongse [42] 


Toolpeng et al. [43] 


Duangphasuk and Thammano 
[44] 


Method: Extreme learning machine and b=4 histogram 
Data: 45 Thai license plates 
Method: Combining the EAST model and text recognition 
and using Bi-LSTM 
Data: Data set with 128x64 pixels Grayscale image of Thai 
license plates 
Error Character: v-v/a-a/a-a and n-a 


Method: Convolutional neural networks (CNN). 
Data: Standard data sets with 640x480 pixels resolution. 
Method: MATLAB, an adaptive switch median filter and 

2-D Otsu thresholding method. 
Data: Training-30 pages 
Testing-10 pages 
Method: the fuzzy ARTMAP neural network 
Data: 12 no-head fonts and eight experiments 
Method: Thai character clusters and longest matching 
Data: 15,000 characters 
Method: A convolution neural network (CNN). 
Data: 124,080 of training images 
20,424 of testing images 
Method: A genetic algorithm. 
Data: 1,015 printed Thai characters. 


Method: The Tesseract OCR engine in a mobile application. 


Data: 50 tickets. 
Method: Laplacian of Gaussian, and Gaussian mixture 
model. 
Data: 192 Thai sign images 
Method: Service routine procedure. 

Data: 156 available samples. 

Method: the hierarchical cross-correlation ARTMAP. 
Data: 32 experiments and two databases. 


90% 


28% to 90% 


Precision: 0.70, Recall: 0.73 
F-Measure: 0.71 


93% 


82.42% 
86.51% 


98% 


97.14% 
98.05% and 100% 


90.22% 
MLP: 100% 


SVM: 85% 
88.71% 


The survey found that computer vision technology has not yet been applied to the Thai vehicle 
registration certificate. Figure 1 shows the Thai vehicle registration certificate issued by the Department of 
Land Transport [45]. This document shows the vehicle information, including the body number, color, and 
weight. The staff records the identification information of transfers, cancellations, modifications, and the 
vehicle owner’s tax history to provide up-to-date information. The document helps readers (e.g., police 
officers, registrars, or car buyers) confirm that the car in question is a registered one and has not been stolen 
or tampered with to conceal an offence. It also helps car owners maintain, inspect, and replace spare parts 
that match the vehicle’s model and design. Furthermore, a car owner can use a copy of the Thai vehicle 
registration certificate to represent the original document. However, Thai vehicle registration certificates are 
still in a hard-copy format, which cannot be automatically connected to the database system. Users must import 
the data into a system with key-ins that may be error-prone. 

The Thai vehicle registration certificate consists of four parts, as follows: Figure 1(a) the cover page, 
showing the vehicle license number and the registered province. Figure 1(b) Car information, showing the 
registration date, registration number, province, car type, car model, brand, model, year, color, engine 
number, type of fuel, number of gas tanks, cylinders, cc, horsepower, weight, payload weight, and seat. 
Figure 1(c) Owner information, showing the vehicle possession, owner ID, date of birth, nationality, address, 
occupant information, address, phone number, hire purchase contract and ownership, and officer and registrar 
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signature. Figure 1(d) Officer recording, showing any change in the history of the car and owner information, 
such as cancellation, color changes, and additional parts. 
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Figure 1. Four parts of the Thai vehicle registration certificate (a) cover page, (b) car information, (c) owner 
information, and (d) officer recording 


3. RESEARCH METHOD 

This study used 84 sample files of Thai vehicle registration certificates for testing, which were 
categorized according to the image size/resolution and image characteristics as shown in Table 4. This paper 
used Intel Core i7-10750H CPU, which has 16 GB (8 GB X2) DDR4 2666 MHz RAM and NVIDIA 
GEFORCE RTX 2060 6 GB GDDR 6 for testing. The computer installed the Tesseract OCR 
v4.1.0.20190314 and PHP: 7.4.15 to develop a web application for sending images to Google Cloud Vision 
and Tesseract recognition via APIs shown in Figures 2 and 3. Moreover, this paper also applied the following 
three image enhancement techniques for the pre-processing process: i) sharpening applies a 3x3 kernel size, 
ii) contrast adjustment changes the contrast of the image, and iii) brightness adjustment changes the 
brightness of the image from -255 to 255. 


Table 4. Sample files characteristics of Thai vehicle registration certificates 


Categories Files 

Image size/resolution 

- Large size/high resolution (>=1024x768px/300dpi) 19 

- Standard size/medium resolution (>=640x480px/120dpi) 45 

- Small size/low resolution (<640x480px/72dpi) 20 
Image characteristics 

- Non-Damage image 53 

- Low brightness image 13 

- Blurred or glared image 4 

- Torn or wrinkled image 2 

- Improperly positioned image 12 


thiagoalessio\TesseractOCR\ E 


($filepath))->lang( )->run(); 


Figure 2. Sending images to be processed with the Tesseract 
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$arrHeader = array(); 


$arrHeader[] = 


array(); 


s" JEOJ][' JE 


$arrPostData['r J[o][ s ' J[O][ 
$arrPostData[ ] J[o][ 5‘ ][O][ ‘me 

f; curi init(); 

curl_setopt($ch, CURLOPT_URL,$strUrl); 
curl_setopt($ch, CURLOPT_HTTPHEADER, $arrHeader); 
curl_setopt($ch, CURLOPT_HEADER, ye 
Icurl_setopt($ch, CURLOPT_POST, NB 
Icurl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($arrPostData) ) ; 
Icurl_setopt($ch, CURLOPT_RETURNTRANSFER, 3 
Icurl_setopt($ch, CURLOPT_SSL_VERIFYPEER, Je 
$result = curl_exec($ch); 

curl_close ($ch); 


$d json_decode($result) ; 
St $data->responses[@]->textAnnotations[@]->description; 


Figure 3. Sending images to be processed with the Google Cloud Vision API 


The results were evaluated in terms of accuracy and readability. This study counted the words and 
specific terms used in the Thai vehicle registration certificate to compare the accuracy. On the other hand, the 
readability was counted from the number of unreadable files. The experiment consisted of 5 steps: 

— Test the performance of the Tesseract and the Google Cloud Vision API with each image size. 

— Test the performance of the Tesseract and the Google Cloud Vision API according to the given image 
characteristic. 

— Improve image quality with three enhancing techniques before processing and repeat the first and second 
steps. 

— Count the number of words and unreadable files in each step to identify the accuracy and readability of 
the tools. 

— Record and evaluate the results. 


4. RESULTS AND DISCUSSION 
4.1. Experiment results 

The results with 84 samples of vehicle registration files showed that Google Cloud Vision was 
better at reading and interpreting vehicle registration certificate information when compared with Tesseract, 
with an average accuracy of 84.43% as shown in Table 5. Tesseract had an average accuracy of 47.02%. 
Considering the file size and resolution, the Google Cloud Vision API had a percentage accuracy that was 
significantly higher than that of Tesseract; the Google Cloud Vision API had an accuracy of 94.72% for the 
large size/high resolution (1024x768 px/300 dpi). Moreover, the small size or low-resolution images had the 
lowest accuracy. 

With regard to image characteristics as shown in Table 6, the group of images taken from the torn or 
wrinkled documents was the group in which both the Tesseract and Google Cloud Vision API had the lowest 
accuracy at 3.55% and 17.73%, respectively. For the group of images taken from blurred or glared 
documents, the accuracy was 35.05% and 55.19%, respectively. For improperly positioned images, such as 
images that were taken from a long distance, skewed images, or partially captured images, the Google Cloud 
Vision API provided a high recognition rate. For the preprocessing techniques, the performance evaluation is 
based on a combination of three image enhancement techniques: Sharpening, contrast adjustment, and 
brightness adjustment. Table 7 shows that the sharpening and brightness adjustment techniques have the 
highest accuracy rates for both the Tesseract and Google Cloud Vision API. Table 8 shows a comparison of 
the unreadable file of the Tesseract and Google Cloud Vision API. All image enhancement techniques could 
reduce the number of unreadable files, except for the contrast and sharpening technique by Tesseract. The 
Google Cloud Vision API also works well for both the original and enhanced images. 

The misinterpreted Thai characters are presented in Table 9. As can be seen from the tale, all 
unrecognized characters (‘w and ‘a’, ‘w and ‘x’) have a similar appearance. When considered in conjunction 


with the image file size and enhancement techniques, the interpretation error rate could be reduced in larger 
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files or higher resolutions and with the sharpening techniques because of the increased clarity that allowed 
the tool to distinguish the characters better. 


Table 5. Accuracy of the Tesseract and Google Cloud Table 6. Accuracy of the Tesseract and Google 


Vision API for image file size/resolution Cloud Vision API for damage image characteristics 
Image Size Accuracy (%) Image Accuracy (%) 
Tesseract G-Vision Tesseract G-Vision 

Large size/high resolution 60.21 94.72 Low brightness image 34.11 72.44 

Standard size/medium 41.17 79.85 Blurred or glared image 35.05 55.19 

resolution Torn or wrinkled image 3.55 17.73 

Small size/low resolution 41.21 71.17 Improperly positioned 19.51 83.15 

Average 47.02 84.43 image 


Table 7. Accuracy of the Tesseract and Google Cloud Vision API for enhancement techniques 


Image Accuracy (%) 
Tesseract _G-Vision 
Sharpening 49.36 87.67 
Contrast 44.20 71.49 
Sharpening and Brightness 52.88 88.90 
Contrast and Sharpening 45.47 76.51 


Table 8. Number of an unreadable file by the Tesseract and Google Cloud Vision API 


Image Unreadable (Number) 
Tesseract __G-Vision 
Original 12 9 
Sharpening 9 6 
Contrast 11 8 
Sharpening and Brightness 8 6 
Contrast and Sharpening 12 7 


Table 9. List of Thai characters misinterpreted by the Tesseract and Google Cloud Vision API 


Actual Character Recognized Character 
Tesseract G-Vision 
a 3,5, š 
u U, U, %, V a, 8, v, u 
n v, %, A, v, 3, A 
n N, N, H 
a a a, a 
v Y, 5U u 
u U, U, My N u, ¥, A, S 
a a n,a 
n A, A, A n, ñ 
a a, A d,a 
a 4,9, a an 
y u, a 
N J 
a ao u 
a l a 
i) A 


4.2. Discussion and suggestion 

This section discusses some significant results and fascinating issues related to other usage areas or 
research and suggests some possible ways to improve the tools to be more efficient in recognizing and 
interpreting results. The Google Cloud Vision API has a higher accuracy than Tesseract in terms of both size 
and image characteristics. In terms of other aspects, we found the following: 
— English letters and Arabic numbers could be recognized equally well by both tools. 
— Both tools were able to recognize high-definition images with a high accuracy, whereas images that were 

very small, far, or blurred could not be recognized by the tools. 
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— Recognizing Thai vowels and tone marks led to the most frequent mistakes, followed by consonants that 
were similar or looked close to each other. 
— The original Thai vehicle registration certificate uses grey color characters and a watermark background 
that effectively reduces the recognition performance. 
— Some Thai vehicle registration certificates have the original strikethrough text and are written with a 
pattern, which makes the models used for reading printed text unable to read handwritten text. 
— Some Thai vehicle registration certificates have errors in printing specific vehicle information on different 
label lines, which leads to the wrong sequencing problem. 
Tesseract is a flexible tool for a developer because it is an open-source software and thus developers 
can develop, customize, and manage Tesseract as per their individual needs. However, Tesseract can be a 
little tricky in terms of the installation process and configuration. The Google Cloud Vision API gives better 
performance than Tesseract because of the availability of a range of services. Furthermore, it is easy to 
connect and configure and run services on it. However, the Google Cloud Vision API may be less 
customizable owing to the availability of in-built services. When comparing the accuracy of the technique 
with other articles on Thai language recognition as detailed in Table 3, this paper found that the articles [33], 
[41]-[43] used the actual cases as a sample and there has higher accuracy than the proposed method. 
However, when studying the details of each article, article [41] recognized only six-digit numbers of the Thai 
lottery ticket. Articles [33], [42], [43] recognized the Thai sign and the car license plate. It contains 
characters and lines less complexity than the Thai vehicle registration certificate. Moreover, Kitvimonrat and 
Watcharabutsarakham [34] found that there are some characters that the tool often miscalculates such as 
“a-g”, “a-a “, “u-u” and “n-a” that consistency with Table 9. 


Next, we explore the possible ways to improve the above tools to be more efficient in recognizing 

and interpreting results: 

— The developer should define the boundary, frame, or pattern matching to reduce unnecessary reading 
information and avoid improperly positioned images. 

— The developer should adjust the character’s color or contrast before the recognizing process, including the 
removal of the additional watermark background. 

— The developer should develop a program that can link models supporting handwritten and printed text 
recognition, helping solve the reading issues with the Thai vehicle registration certificate. 

— Information retrieval methods or natural language processing methods can improve the accuracy of the 
post-processing results. 


5. CONCLUSION 

This paper compares the Thai character recognition performance between the Tesseract and Google 
Cloud Vision API to develop an automatic document recognition system for the Thai vehicle registration 
certificate in the future. The study and testing results showed that Google Cloud Vision was more accurate 
than Tesseract, with accuracies of 84.43% and 47.02%, respectively. This study recommends an image size 
of 1024x768 px or a resolution of 300 dpi or more, and preferably sharpening and brightness adjustment 
before processing, which can significantly improve the accuracy of the tools. 
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